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Abstract 

We adopt an empirical approach to the characterization of the distribution of twin primes 
within the set of primes, rather than in the set of all natural numbers. The occurrences of 
twin primes in any finite sequence of primes are like fixed probability random events. As the 
sequence of primes grows, the probability decreases as the reciprocal of the count of primes to 
that point. The manner of the decrease is consistent with the Hardy-Littlewood Conjecture, 
the Prime Number Theorem, and the Twin Prime Conjecture. Furthermore, our probabilistic 
model, is simply parameterized. We discuss a simple test which indicates the consistency of 
the model extrapolated outside of the range in which it was constructed. 
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1 Introduction 

Prime numbers 0|, with their many wonderful properties, have been an intriguing subject of 
mathematical investigation since ancient times. The "twin primes," pairs of prime numbers {p-,p + 
2} are a subset of the primes and themselves possess remarkable properties. In particular, we note 
that the Twin Prime Conjecture, that there exists an infinite number of these prime number pairs 
which differ by 2, is not yet proven ||. 

In recent years much human labor and computational effort have been expended on the subject 
of twin primes. The general aims of these researches have been three-fold: the task of enumerating 
the twin primes [Q [i.e., identifying the members of this particular subset of the natural numbers, 
and its higher-order variants "fc-tuples" of primes), the attempt to elucidate how twin primes 
are distributed among the natural numbers ^, |^, ^ {especially searches for long gaps in the 
sequence |ll[ ) , and finally, the precise estimation of the value of Brun's Constant . 

Many authors have observed that the twin primes, along with the primes themselves, generally 
become more sparse or diffuse as their magnitude increases. In fact the Prime Number Theorem 
may be rephrased to state that the number of prime numbers less than or equal to some large (not 
necessarily prime) number N is approximately]^ 

MN)^£^^d.. (1) 

A similar result is believed to hold for the number of twin primes where each element of the pair 
is less than or equal to large N 

n2{N)^2c2 -JT—^dx, (2) 
J2 (ln(a;)) 



* patrickJselly@ndsu.nodak.edu 
tterry@mailaps.org 

^In fact, more accurate approximations are known, but the formulae we quote suffice for our purposes. 
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where the "twin prime" constant |13| C2 has the numerical value C2 — 0.661618 . . ., and is currently 
known to many decimal places |14| , 15 . The expression is the first instance of the Hardy- 
Littlewood conjectures which estimate the multiplicity of fc-tuples of primes smaller than natural 
number TV ||. 

In our investigation, we sought to account for the effect of the primes themselves becoming 
more rarefied by examining the distribution of twin primes within the prime numbers. We have 
done this for the set of prime numbers less than approximately 4 x 10^, and have observed that 
within this range the occurrence of twin primes may be characterized as slowly-varying-probability 
random events. 



2 Method and Results 

To ensure that there is no confusion, let us first make very clear our methodology. We generated 
prime numbers in sequence, viz, Pi = 2, P2 = 3, P3 = 5 . . ., and within this sequence identified 
twin primes and their prime separations as illustrated somewhat schematically below. 

■■■Pi {Pi+l Pi+2) Pi+3 Pi+4 {Pi+5 Pi+e) Pi+7 Pi+S Pi+9 P-i+lO {Pi+11 Pi+u) Pi+13 '■■ 

We say that the first pair of twins in the above sequence has a prime separation of 2. There 
are two non-twin prime numbers, i.e., singletons, which occur between the second prime element, 
Pi+2, of the first twin and the first prime element, Pi+5, of the subsequent twin. Similarly, the 
second pair of twins has prime separation 4. Note that there are many twins with prime separation 
equal to zero: for example (5 7)(11 13), or (137 139)(149 151). Actually, all of the prime 4-tuples 
(P, P + 2, P + 6, P + 8) are comprised of a pair of twins with zero separation in primes. 

There is an irregularity with our definition of separation for the first few primes: 2 (3 5) (5 7), 
where the pairs in fact overlap, yielding a prime separation of —1. Fortunately, for very well-known 
reasons such overlapping twins do not ever recur and we choose to begin our analysis with the twin 
(5 7). For instance, in the set of seven twins between 5 and 100, 

{(5 7)(11 13)(17 19)(29 31)(41 43)(59 61)(71 73)}, 

there are 6 separations. Three of these happen to be 0, two are 1, and one is 2, so the relative 
frequencies for separations s = 0, 1, 2 are i, |, and i, respectively. 

From the set of primes, all separations between pairs of twins up to a fixed number N were 
computed and tabulated. [We chose certain values of N in the range 79561 to 4020634603. These 
particular numbers are the second prime elements of the thousandth and twelve-millionth twins 
respectively. Many of our A'^'s were chosen such that our analysis started and ended on twins. 
However the behavior that we observe holds for all N, sufficiently large, with the understanding 
that the singleton primes between the last twin and the upper bound N are ignored.] The logarithm 
of the relative frequency of occurrence of each separation in each of our analyses appears to obey 
a surprisingly simple relation as illustrated schematically in Figure |l]. 

Two comments must be made. The first is that this remarkable behavior is perfectly character- 
istic of a completely random system. We infer that as one approaches each prime member in the 
sequence of primes following a twin, the likelihood of it being the first member of the next twin 
prime is constant! By way of analogy, we can consider a radioactive substance. The likelihood of 
one of its atoms decaying in any short time interval is fixed, with the effect that the probability 
that the next decay occurs at time t is just Me~^* , where 7 is the decay rate, and A/" is a constant 
to ensure appropriate normalization. The measured slope of the line fit to our data provides a 
decay constant which is particular to the twin primes. Again, recourse to our analogy is warranted. 
The decay rate of a radioactive element is a defining characteristic of that element. We appear to 
have the occurrences of twin primes governed by a similar sort of "prime constant." 

We qualify this statement, however, since the curve in Figure ^is only illustrative because the 
slope varies inversely with A^. Figure || displays curves (with associated best-fit straight lines) for 



2 



CD -6 



-10 



-12 



-0.10268*x + log(0. 10268) 
-0.10644*x-flog(0. 10644) 



•■t++ 



20 



40 60 
Separation in Primes 



80 



100 



Figure 1: Data and Linear Fits for log (frequency) vs. separation, in the case N = 10^. 



the twin prime separation data for ranges [5,10^], [5,10®] and [5,10^], iUustrating the variation 
with N. 

Were it not the case that the magnitude of the slope diminished for larger values of N then the 
Hardy-Littlewood Conjecture for twins would certainly fail to hold. If the slopes of our lines were 
indeed the same value for all N, meaning that the probability of a given prime being a member of 
a twin pair is a universal constant, then the number of twin primes 7r2(iV) would just be a fixed 
fraction of tti (N) in disagreement with Hardy-Littlewood and the empirical data. 

The linear fits that we employed were constrained to ensure that the relative frequencies are 
properly normalized. That is, if the relative frequency with which separation s occurs obeys the 
exponential relation consistent with our data, then for the sum of frequencies to be normalized to 
1, we must have 

+ (intercept) = ln(— (slope)) , (3) 



so the fit that we performed was constrained by the one-parameter Ansatz 

f{s) — ~ms + ln(m) . 



(4) 



Table |l| gives the best-fit values of this probability-conserving slope (m) for various iV, along with 
simple statistical estimates of the uncertainty. The quoted error estimates measure only the quality 
of the estimate of m and and does not account for effects resulting from our arbitrary choices of 
N. It may very well be the case that a more realistic assessment of error would double the values 
quoted in the table. 

Two comments must be made. The first is that all of the separations which appeared in the 
data received equal (frequency- weighted) consideration in our computation of best-fit slopes. This 
has a consequence insofar as the large-separation, low-frequency events constituting the tail of the 
distribution reduce the magnitude of the slope, as is readily seen in Figures |l| and ||. One might 
well be inclined to truncate the data by excising the tails and fixing the slopes by the (more- 
strongly-linear) low-separation data for each N. We did not do this because it would have entailed 
a generally systematic discarding of data from pairs appearing near the upper limit of the range. 
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Figure 2: Data and Linear Fits for log(frequency) vs. separation, in the cases N = 10^, 10^, 10^. 

and thus would nearly correspond to the slope with greater magnitude that one would expect 
associated with an effective upper limit N^q < N. Viewed from this perspective, it is better to 
consider all points rather than submit to this degree of uncertainty. The second comment is that 
we have thus far adhered to the convention of expressing all of our results in terms of the natural 
number N. We shall now pass over to a characterization in terms of tti - itself a function of - 
which better suits our viewpoint that the analysis of the distribution of twins is most meaningful 
when considered in terms of the primes themselves. 

In light of the above comments, we sketch in Figure || a plot of the slopes (computed in the 
manner described) versus log(7ri). The trend seen on the graph may be well-described by the 
function (remember that the error bars are understated) 

-m(x) ~ -(1.321 ±0.010)/a; , for a; = log(7ri(7V)) . (5) 

There are two amazing features of this functional form for the dependence of the slope on tti . The 
first is that the factor which appears looks suspiciously like — 2c2, the twin primes constant! This 
result will be confirmed in the next section. The second feature is that 

lim —'m{x) — 0~ , 

X *oo 

i.e., as one progresses through the infinite set of primes, the slope which governs the distribution of 
twins does not crash through zero. This is consistent with the Hardy-Littlewood conjecture insofar 
as the twins become progressively more sparse within the set of primes. In addition, it is consistent 
with the Twin Prime Conjecture in that the reciprocal of the slope admits the interpretation of 
being the "expected number of primes interspersed between a given twin and the next twin in 
order." ^ 

s — — , for random fixed probability events. (6) 
m 

Since s remains finite for all tti, then we can conjecture that wherever one happens to be in the 
infinite set of primes it is possible to characterize the expected number of primes that will be 
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1 X 10^ 


0.141667 


0.00599 


7793 


79561 


5 X 10^ 


0.122415 
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5 X 10'' 


0.104126 


0.00105 


556396 


8264959 


1 X lO'^ 


0.096421 
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1175775 


18409201 


5 X 10^ 


c\ c\C) r""! c\(\ 

0.086700 


0.00056 


6596231 


115438669 


1 X 10^ 


0.081143 


0.00041 


13804822 


252427603 


3 X 10^ 


0.075491 


0.00035 


44214960 


863029303 


5 X 10^ 


0.073150 


0.00031 


75860671 


1523975911 


8 X 10^ 


0.070965 


0.00032 


124538861 


2566997821 


1 X 10^ 


0.070154 


0.00029 


157523559 


3285916171 


1.2 X 10^ 


0.069814 


0.00024 


190894477 


4020634603 



Table 1: Values of slope, statistical error, 7ri(A^), and Tr2{N) for certain N from 79561 to 
4020634603. 



encountered on the way to the next twin. This is also consistent with the Twin Prime Conjecture, 
although unfortunately it is empirical and does not constitute a proof. 



3 Interpretive Framework 

Recall that, up to this point, all of our results have been purely empirical. Now, we will argue for 
their essential truth and consistency beyond the range of our data. 

Consider our approximation (|l]) for 7ri(iV). Making the additional draconian approximation 
that the integrand is constant at its minimum value, and discarding a small term, we get the 
oft-quoted estimate 

In precisely the same manner we get 

MN)-2c, ^ (8) 
(log(Ar)) 

The entire set of prime numbers less than N consists of the 2 x tt2{N) elements which occur 
together in twins and 'jti{N) — 2 x n2{N) singletons. Now, let us suppose that the set of twins is 
randomly interspersed among the singletons. This would imply that between each twin pair there 
will appear, on average, so(N) singletons, where 

soiN) = ^-1^ . (9) 

Note that there is an essential distinction between s which arises from the actual distribution of 
separations and sq which, in effect, assumes that the twins are evenly spaced. That is, from a 
value of s(7ri(7V)) one can infer m and thus the probability distribution of twin prime separations 
characteristic of the set of primes less than A'^. On the other hand, So{N) is an average value 
in which no account is taken of the details of the distribution and thus no more information is 
contained in it. 

In the approximation scheme developed in and (^), 

. log(jV) - 4c2 log(iV) 
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Figure 3: Slope Data and our empirical fit (||) vs. log(7ri). 



Furthermore, to very lowest-order 



log(7V) ~ log(7ri) + log(log(7ri)) ^ 



and hence 



So 



log(7ri) 

2C2 



(11) 



(12) 



Finally, taking this as the expected number of singleton primes occurring between twin prime pairs 
for numbers less than or equal to N we see immediately that 

1 2C2 

m — — ^ — 

sq 



(13) 



log(7ri) 

is completely consistent with the Prime Number Theorem, the Hardy-Littlewood Conjecture, and 
with our empirical results. 

As an aside, one might consider the effect of attempting to improve upon the draconian approx- 
imation. It turns out that any reasonable improvement merely results in the addition of (small) 
constant terms which may be neglected in the limit of large N. 

We are quite surprised that our empirical results yield the large N limit with such accuracy. 

Another test of the general consistency of our model is by comparison with toq, where 



mo(iV) 



1 



(14) 



soiN) ^i{N)-2n2{N) 

Bearing in mind that we are modeling more accurately the actual distribution of prime separations 
with s than with sq, we do not expect perfect agreement, but rather that the general trend exposed 
by So will be followed by s if investigated beyond the range thus far examined. We sketch below a 
plot of Too vs. logTTi using precise values for tti and 7r2 computed by T.R. Nicely Note that 
we have made the small adjustments of decrementing the published 7r2's by one and decrementing 
the TTi by two to take into account our skipping the anomalous prime 2 and twin (3 5). We are 
quite encouraged by the correspondence of the data on graph, and believe that the distributional 
model does extend itself well beyond the range of our present data. 
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Figure 4: mo using Nicely Data and our empirical fit vs. log(7ri). 

4 Conclusion 

We believe that we have constructed a novel characterization of the distribution of twin primes. 
The most essential feature of our approach is that we consider the spacings of twins among the 
primes themselves, rather than among the natural numbers. Secondly, we modeled the distribution 
empirically - without preconceptions ~ and argued that for any given N (larger than 10'*, say) the 
twin primes appear amongst the sequence of primes in a manner characteristic of a completely 
random, fixed probability system. Again working empirically, we noted that the "fixed" probability 
varied with iV, in a manner consistent with Theorems and with Conjectures that are believed to 
hold. We have parameterized the variation of the "separation constant" in terms of tti, as suggested 
by our outlook, and have discovered that it has a particularly simple functional form and is also 
consistent with the established Theorems and Conjectures. 

With this model for the distribution now in hand and assumed viable, we are beginning to 
investigate other consequences. These will be reported upon in a forthcoming paper p^ . 
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